Goto

Collaborating Authors

 out-of-fold prediction


Enhancing binary classification: A new stacking method via leveraging computational geometry

arXiv.org Artificial Intelligence

Binary classification is a fundamental task in machine learning and data science, with applications spanning numerous domains, including spam detection, medical diagnostics, image recognition, credit scoring. The goal is to predict a binary outcome--typically labeled as 0 or 1--based on a set of input features. Various machine learning algorithms, such as logistic regression (LR), k-nearest neighbors (kNN), support vector machines (SVM), and neural network (NN), are commonly employed for binary classification tasks. These algorithms can be mainly divided into two categories: those with interpretability, which are convenient for analysis and control (e.g., LR); and those without interpretability but with potentially good classification performance (e.g., NN). Ensemble learning, a powerful technique in predictive modeling, has gained widespread recognition for its ability to improve model performance by combining the strengths of multiple learning algorithms [1]. Among ensemble methods, stacking stands out by integrating the predictions of diverse base models (different learning algorithms) through a meta-model, resulting in enhanced prediction accuracy compared to only using the best base model [2]. Stacking has demonstrated significant applications in classification problems such as network intrusion detection [3, 4], cancer type classification [5], credit lending [6], and protein-protein binding affinity prediction [7].


How to Use Out-of-Fold Predictions in Machine Learning

#artificialintelligence

Machine learning algorithms are typically evaluated using resampling techniques such as k-fold cross-validation. During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are referred to as out-of-fold predictions, a type of out-of-sample predictions. Out-of-fold predictions play an important role in machine learning in both estimating the performance of a model when making predictions on new data in the future, so-called the generalization performance of the model, and in the development of ensemble models. In this tutorial, you will discover a gentle introduction to out-of-fold predictions in machine learning. How to Use Out-of-Fold Predictions in Machine Learning Photos by Gael Varoquaux, some rights reserved.


Dataiku's Solution to SPHERE's Activity Recognition Challenge

arXiv.org Machine Learning

Our team won the second prize of the Safe Aging with SPHERE Challenge organized by SPHERE, in conjunction with ECML-PKDD and Driven Data. The goal of the competition was to recognize activities performed by humans, using sensor data. This paper presents our solution. It is based on a rich pre-processing and state of the art machine learning methods. From the raw train data, we generate a synthetic train set with the same statistical characteristics as the test set. We then perform feature engineering. The machine learning modeling part is based on stacking weak learners through a grid searched XGBoost algorithm. Finally, we use post-processing to smooth our predictions over time.